|
Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, which processes one operation on multiple pairs of operands at once. For example, modern conventional computers, including specialized supercomputers, typically have vector operations that simultaneously perform operations such as the following four additions: : However, in most programming languages one typically writes loops that sequentially perform additions of many numbers. Here is an example of such a loop, written in C: A vectorizing compiler transforms such loops into sequences of vector operations. These vector operations perform additions on length-four (in our example) blocks of elements from the arrays a , b and c . Automatic vectorization is a major research topic in computer science.==Background== Early computers generally had one logic unit that sequentially executed one instruction on one operand pair at a time. Computer programs and programming languages were accordingly designed to execute sequentially. Modern computers can do many things at once. Many optimizing compilers feature auto-vectorization, a compiler feature where particular parts of sequential programs are transformed into equivalent parallel ones, to produce code which will well utilize a vector processor. For a compiler to produce such efficient code for a programming language intended for use on a vector-processor would be much simpler, but, as much real-world code is sequential, the optimization is of great utility. Loop vectorization converts procedural loops that iterate over multiple pairs of data items and assigns a separate processing unit to each pair. Most programs spend most of their execution times within such loops. Vectorizing loops can lead to significant performance gains without programmer intervention, especially on large data sets. Vectorization can sometimes instead slow execution because of pipeline synchronization, data movement timing and other issues. Intel's MMX, SSE, AVX and Power Architecture's AltiVec and ARM's NEON instruction sets support such vectorized loops. Many constraints prevent or hinder vectorization. Loop dependence analysis identifies loops that can be vectorized, relying on the data dependence of the instructions inside loops. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Automatic vectorization」の詳細全文を読む スポンサード リンク
|